Information Theoretic Model Selection for Pattern Analysis

نویسندگان

  • Joachim M. Buhmann
  • Morteza Haghir Chehreghani
  • Mario Frank
  • Andreas P. Streich
چکیده

Exploratory data analysis requires (i) to define a set of patterns hypothesized to exist in the data, (ii) to specify a suitable quantification principle or cost function to rank these patterns and (iii) to validate the inferred patterns. For data clustering, the patterns are object partitionings into k groups; for PCA or truncated SVD, the patterns are orthogonal transformations with projections to a low-dimensional space. We propose an information theoretic principle for model selection and model-order selection. Our principle ranks competing pattern cost functions according to their ability to extract context sensitive information from noisy data with respect to the chosen hypothesis class. Sets of approximative solutions serve as a basis for a communication protocol. Analogous to Buhmann (2010), inferred models maximize the so-called approximation capacity that is the mutual information between coarsened training data patterns and coarsened test data patterns. We demonstrate how to apply our validation framework by the well-known Gaussian mixture model and by a multi-label clustering approach for role mining in binary user privilege assignments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combination of real options and game-theoretic approach in investment analysis

Investments in technology create a large amount of capital investments by major companies. Assessing such investment projects is identified as critical to the efficient assignment of resources. Viewing investment projects as real options, this paper expands a method for assessing technology investment decisions in the linkage existence of uncertainty and competition. It combines the game-theore...

متن کامل

The Application of Systems-Theoretic Accident Model and Process in the Systematic Nonlinear Analysis of Accidents in Car Industry

Background & objectives: Hundreds of methods have been introduced to analyze various events. Hence one of the effective and principle steps in accident analysis is proper and targeted selection of accident analysis method. Traditional methods of accident analysis in complex industries are not comprehensive and examine each components of the system separately. So, the use of new systematic metho...

متن کامل

Response to Kanatani

We discuss the advantages and disadvantages of two approaches to model selection: the information theoretic method suggested by Kanatani [3] and others, and our heuristic sequential selection method [9]. keywords: curve segmentation, model selection, line, ellipse

متن کامل

A theoretical investigation of several model selection criteria for dimensionality reduction

Based on the problem of determining the hidden dimensionality (or the number of latent factors) of Factor Analysis (FA) model, this paper provides a theoretic comparison on several classical model selection criteria, including Akaike’s Information Criterion (AIC), Bozdogan’s Consistent Akaike’s Information Criterion (CAIC), Hannan–Quinn information criterion (HQC), Schwarz’s Bayesian Informatio...

متن کامل

Information theoretic combination of pattern classifiers

Combining several classifiers has proved to be an effective machine learning technique. Two concepts clearly influence the performances of an ensemble of classifiers: the diversity between classifiers and the individual accuracies of the classifiers. In this paper we propose an information theoretic framework to establish a link between these quantities. As they appear to be contradictory, we p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012